Accuracy of two deep learning–based reconstruction methods compared with an adaptive statistical iterative reconstruction method for solid and ground-glass nodule volumetry on low-dose and ultra–low-dose chest computed tomography: A phantom study

No published studies have evaluated the accuracy of volumetric measurement of solid nodules and ground-glass nodules on low-dose or ultra–low-dose chest computed tomography, reconstructed using deep learning–based algorithms. This is an important issue in lung cancer screening. Our study aimed to investigate the accuracy of semiautomatic volume measurement of solid nodules and ground-glass nodules, using two deep learning–based image reconstruction algorithms (Truefidelity and ClariCT.AI), compared with iterative reconstruction (ASiR-V) in low-dose and ultra–low-dose settings. We performed computed tomography scans of solid nodules and ground-glass nodules of different diameters placed in a phantom at four radiation doses (120 kVp/220 mA, 120 kVp/90 mA, 120 kVp/40 mA, and 80 kVp/40 mA). Each scan was reconstructed using Truefidelity, ClariCT.AI, and ASiR-V. The solid nodule and ground-glass nodule volumes were measured semiautomatically. The gold-standard volumes could be calculated using the diameter since all nodule phantoms are perfectly spherical. Subsequently, absolute percentage measurement errors of the measured volumes were calculated. Image noise was also calculated. Across all nodules at all dose settings, the absolute percentage measurement errors of Truefidelity and ClariCT.AI were less than 11%; they were significantly lower with Truefidelity or ClariCT.AI than with ASiR-V (all P<0.05). The absolute percentage measurement errors for the smallest solid nodule (3 mm) reconstructed by Truefidelity or ClariCT.AI at all dose settings were significantly lower than those of this nodule reconstructed by ASiR-V (all P<0.05). Furthermore, the lowest absolute percentage measurement errors for ground-glass nodules were observed with Truefidelity or ClariCT.AI at all dose settings. The absolute percentage measurement errors for ground-glass nodules reconstructed with Truefidelity at ultra–low-dose settings were significantly lower than those of all sizes of ground-glass nodules reconstructed with ASiR-V (all P<0.05). Image noise was lowest with Truefidelity (all P<0.05). In conclusion, the deep learning–based algorithms were more accurate for volume measurements of both solid nodules and ground-glass nodules than ASiR-V at both low-dose and ultra–low-dose settings.

Mimicking the human thorax: Anthropomorphic thoracic phantom with synthetic lung nodules A commercial multipurpose anthropomorphic chest phantom (Lungman; Kyoto Kagaku Co., Ltd, Kyoto, Japan) was used to mimic the human thorax. This phantom is a life-sized anatomical human male thorax model consisting of soft tissue substitute materials and synthetic bones, all of which show X-ray attenuation properties similar to their corresponding human tissues. Three-dimensional synthetic pulmonary vessels and bronchi were also inserted into the phantom for structural similarity.
In total, ten spherical synthetic pulmonary nodules were used, and the characteristics of those nodules are described in Table 1: four different-sized SNs (3 mm, 5 mm, 8 mm, and 10 mm in diameter) and three different-sized GGNs (5 mm, 8 mm, and 10 mm in diameter). The attenuation values of the GGNs were −630 and −800 Hounsfield units (HU). Since all nodule phantoms are perfectly spherical, the volume can be calculated using the diameter. The nodules were randomly placed and fixed within in the phantom using double-sided tape.

CT image acquisition
All CT images were obtained using a Revolution ES scanner (GE Healthcare, Chicago, IL, USA). Image acquisition was carried out using four different radiation doses (120 kVp/220 mA, 120 kVp/90 mA, 120 kVp/40 mA, and 80 kVp/40 mA). The study protocol and radiation dose data are summarized in Table 2. The CT dose index volume (CTDI vol ) and dose length product (DLP) were recorded for all CT examinations, and the effective dose (ED) was calculated using a conversion coefficient (0.014 mSv/mGy×cm) for chest CT [20]. The reconstruction algorithm was "Lung". The scan parameters were as follows: noise index, 15; gantry rotation time, 0.35 s; coverage speed, 350 mm/s; pitch, 1.53:1; and slice thickness, 1.25 mm.

Image reconstruction algorithms
Ten nodules scanned in four different radiation dose settings were reconstructed with ASiR-V with a blending factor of 70%, TFI with a high strength level, and ClariCT.AI. Overall, three different reconstruction algorithms (ASiR-V, TFI, and ClariCT.AI) were used with each dose setting. Therefore, a total of 120 reconstruction imaging datasets were finally obtained and analyzed.

Nodule volume and image noise measurements
All nodule volumes were measured by two radiologists (T.K. and C.K., with 1 and 11 years of experience in thoracic imaging, respectively, to determine the interobserver variability) using commercially available software (Aquarius iNtuition Edition, Terarecon, Foster City, CA, USA) previously used for nodule volumetry in several studies [3,21,22]. Semiautomatic nodule segmentation was performed by clicking at the center of each nodule. The default segmentation attenuation thresholds representing SNs and GGNs were −850 HU and −300 HU, respectively. Further adjustments of these thresholds were performed by the radiologists if the software-based nodule segmentation was determined to be inadequate in consensus. However, most segmentation procedures were performed with a single click without manual modification. These measurements were repeated four times.
The absolute percentage measurement error (APE), which is the difference between the measured volume and the reference volume, was calculated for analysis, as described previously [3,22]. The reference volume can be calculated using the diameter since all nodule phantoms are perfectly spherical. The APE of each nodule volume was calculated as follows: | measured nodule volume in each algorithm − reference nodule volume| × 100 / reference nodule volume. The volume of all nodules, regardless of size and type, was measured four times, and the APE value was calculated using the reference volume of each nodule from each measurement value. After that, the mean and standard deviation of the APE values of each nodule were calculated. Since the measurements were repeated, APEs are presented as mean ± standard deviation (SD). The results are presented in terms of nodule type (across all nodules, all SN, all GGN), in terms of both nodule size and nodule type, and in terms of two attenuation levels (-630 HU and -800 HU) for GGNs.
Image noise was also calculated to assess the image quality by averaging three different SDs of attenuation, as described previously [3]: two values were from both lung fields of the phantom (right posteromedial lung field near the mediastinum and left posterolateral lung field near the thoracic wall at the level of the heart), and one value was from the room air outside of the chest wall (3 cm away from the anteromedial chest wall). A circular region of interest (ROI) with an area of 120 mm 2 was used.

Task-based transfer function (TTF) and noise power spectrum (NPS)
TTF and NPS were also analyzed using an ACR (American College of Radiology) CT certified phantom (Gammex 464, Sun Nuclear, Middleton, WI, USA) to evaluate quantitative image quality at the different radiation dose levels. The phantom used in the experiment has a module composed of four layers, NPS was measured in module 3 with a homogeneous medium, and TTF was measured in module 1 including cylindrical inserts of various materials. In this experiment, TTF was measured using bone and acrylic cylinder rods. To quantify TTF, we calculated the spatial frequency (TTF 50% ) value at the point where the y-axis value becomes 0.5 in the measured TTF curve. Additionally, we directly implemented the 3D-based NPS used by many researchers based on the method presented by the American Association of Physicists in Medicine (AAPM) [23]. The TTF used imQuest (Duke University, Durham, NC, USA) software, implemented using Matlab (Version R2017a, The MathWorks, Inc., Natick, MA, USA), and the NPS of the ACR phantom image was implemented and calculated using Matlab.

Statistical analysis
For the repeated measures data analysis, repeated measures analysis of variance (RM ANOVA) as a parametric test or Friedman's test as a nonparametric test was performed. Bonferroni correction was used to adjust the significance level and confidence interval for multiple comparisons of main effects. When the sphericity assumption by Mauchly's test of sphericity was not met, the P-value from the Greenhouse-Geisser correction was used for tests of withinsubject effects. Intraclass correlation coefficient (ICC) analysis was used to evaluate interobserver variability. ICC results were interpreted as follows: <0.40, poor agreement; 0.40-0.59, fair agreement; 0.60-0.74, good agreement; and 0.75-1.00, excellent agreement. A P-value less than 0.05 was considered statistically significant. All statistical analyses were performed using SPSS Statistics for Windows, version 25 (IBM Corp., Armonk, NY, USA).

Mean APE according to nodule type
The APEs for different nodule types are shown in Table 3. The mean APEs of TFI and Clar-iCT.AI across all nodules at all dose settings were less than 11%.
The mean APEs of SNs at all dose settings were the lowest with ClariCT.AI, with a significant difference at 80 kVp/40 mA (CTDI vol , 0.2 mGy) (P<0.05).

PLOS ONE
Accuracy of AI-based reconstruction on low-dose CT

TTF and NPS analysis
The results of TTF curves for all reconstruction type with the bone and acrylic inserts are shown in S1 and S2 Figs. S1 Table reports the TTF 50% values for both inserts, and TTF 50%  Fig. shows the NPS curves for all reconstruction types and all dose settings. The NPS peak increased as the dose decreased. NPS peaks were lower with TFI and ClariCT.AI than ASiR-V, and TFI yielded the lowest NPS peak at all dose settings among the reconstruction algorithms. In all dose conditions, ClariCT.AI shifted toward a lower average spatial frequency compared with ASiR-V, but TFI yielded a similar level to that of ASiR-V.
Although recent studies have revealed improved image quality or increased lesion detectability with deep learning image reconstruction compared with IR, no published studies have evaluated the accuracy of pulmonary nodule volume measurement [6,7]. Greffier et al. performed a phantom study and demonstrated increased detectability with TFI relative to ASiR-V, permitting dose reduction [7]. However, the lesions in the phantom were not pulmonary nodules but a large mass in the liver, a small calcification, and a small subtle lesion with low contrast. The study performed by Kim et al. showed that TFI was superior to ASiR-V for identifying anatomic structures in the human thorax, such as the pulmonary arteries/veins, trachea/bronchi, lymph nodes, and pleura/pericardium, and had an increased SNR and CNR, but they did not analyze detectability or volumetry of lung nodules [11]. Hata et al. found that TFI was associated with significantly less noise, a higher SNR/CNR, and finer image texture than ASiR-V [9]. They also demonstrated that the combination of deep learning-based denoising and IR improved image quality and Lung Imaging Reporting and Data System (Lung-RADS) evaluation on ultra-low-dose CT [18].
Due to technical factors affecting nodule volumetry measurement reliability (e.g., interobserver variability between radiologists, software packages, CT manufacturers, intravenous contrast material, inspiratory effort), approximately 5-25% variation has been reported between different CT scans of the same nodule performed on the same day [24]. Therefore, a volume change <25% may be due to interscan variability and is considered "absence of nodule growth", whereas a volume change >100% usually indicates obvious nodule growth [24]. Therefore, it might be acceptable that the volumetric measurement errors of TFI and ClariCT. AI across all nodules at all dose settings were less than 11% in our study results. In general, except for some cases, the smaller the size, the larger the volumetric measurement error at all dose settings for both SNs and GGNs. For example, the mean APEs of 3-mm SNs at 80 kVp/40 mA (CTDI vol , 0.2 mGy) were 26.89 ± 6.80 for ASiR-V, 17.59 ± 6.07 with TFI, and 17.96 ± 8.58 with ClariCT.AI, whereas the mean APEs of 10-mm SNs were 9.44 ± 7.60 with ASiR-V, 5.38 ± 6.68 with TFI, and 5.90 ± 2.63 with ClariCT.AI, suggesting that the smaller the nodule size, the more careful the interpretation of the results of volumetric measurement is. However, in our study, among all APEs of SNs and GGNs, only some ASiR-V instances yielded volumetric measurement errors of 25% or more.
The present study showed significantly improved accuracy of volumetry of SNs and GGNs using TFI or ClariCT.AI compared with ASiR. Additionally, for 5-mm, 8-mm, and 10-mm SNs, the volumetric measurement errors of TFI and ClariCT.AI in low-dose and ultra-lowdose settings were less than 10%. For a 3-mm SN, the volumetric measurement errors of TFI were less than 20%, and those of ClariCT.AI were less than 10% except in association with a dose setting of 80 kVp/40 mA (CTDI vol , 0.2 mGy); however, TFI yielded a significantly lower APE than ASiR-V (P<0.05). Therefore, ultra-low-dose CT scans of less than 1 mSv reconstructed with TFI or ClariCT.AI could be used for follow-up of SNs measuring at least 5 mm, and scans obtained at a dose setting of at least 120 kVp/40mA (CTDI vol , 0.62 mGy) and reconstructed with TFI or ClariCT.AI might be acceptable for follow-up of SNs measuring 3 mm.
For GGN volumetry, we used two types of GGNs with different attenuations (−630 HU and −800 HU). For all reconstructions at all dose settings, the measurement error tended to be higher for −800 HU GGNs than for −630 HU GGNs, probably because −800 HU GGNs are fainter. However, the volumetric measurement errors associated with all sizes of GGNs at all dose settings assessed with TFI were less than 10%. Additionally, at dose settings of 120 kVp/ 220 mA (CTDI vol , 3.39 mGy) and 120 kVp/90 mA (CTDI vol , 1.39 mGy), the volumetric measurement errors for all sizes of GGNs evaluated with ClariCT.AI were less than 10%. TFI and ClariCT.AI yielded significantly lower APEs at 120 kVp/40 mA (CTDI vol , 0.62 mGy) and 80 kVp/40 mA (CTDI vol , 0.2 mGy) for all sizes of −630 and −800 HU GGNs. For follow-up of GGNs measuring at least 5 mm and −800 HU, ultra-low-dose CT scans performed at less than 1 mSv and reconstructed with TFI could be useful, and low-dose CT scans performed using dose settings of at least 120 kVp/90 mA (CTDI vol , 1.39 mGy) and reconstructed with ClariCT. AI might be acceptable.
TTF is a representative metric of spatial resolution, and the TTF curve provides the degree of contrast ratio of the original object across spatial frequencies [25]. Therefore, a higher TTF value means better spatial resolution. In our study, for the bone insert, higher TTF 50% values were associated with TFI and ClariCT.AI than with ASiR-V at all dose settings, and the highest TTF 50% values were also associated with in TFI among the reconstruction algorithms at all dose settings, indicative of the better spatial resolution of the deep learning-based algorithms. NPS is a method that can determine the amount of noise (magnitude) and noise characteristics (texture) in the spatial frequency domain [26]. This is a way to overcome the drawback of pixel SD in evaluating image noise, and is commonly used for image quality evaluation, optimization, and image-to-image comparison. In our study, NPS peaks were lower with TFI and Clar-iCT.AI than with ASiR-V, and TFI yielded the lowest NPS peak at all dose settings among the reconstruction algorithms, maintaining textures. Similar results have also been reported in previous studies. Greffier et al. assessed the impact on image quality and radiation dose of TFI (compared with a hybrid IR algorithm) using TTF and NPS, and they demonstrated that TFI reduced noise and improved spatial resolution and detectability without perceived alteration of the texture, similar to that of conventional IR [7]. Also, ClariCT.AI improved spatial resolution characteristics and slightly lowered noise compared with conventional IR [27]. Based on these results, improved accuracy of nodule volumetry might be possible with the deep learning-based reconstruction algorithms highlighted by our study.
TFI is a deep learning-based reconstruction algorithm developed by a specific CT machine manufacturer (GE Healthcare). Currently, TFI can only be applied to a specific CT scanner made by GE Healthcare. On the other hand, ClariCT.AI is a deep learning-based denoising software that has the advantage of being applicable to images obtained by all CT scanners. In this study, the accuracy of lung nodule volumetry with TFI and ClariCT.AI was significantly superior to that assessed with IR, but TFI could be applied at lower radiation doses than those compatible with ClariCT.AI for follow-up of 5-mm GGNs. TFI was also associated with significantly lower image noise than ClariCT.AI at 120 kVp/90 mA (CTDI vol , 1.39 mGy) and 120 kVp/40 mA (CTDI vol , 0.62 mGy). However, despite these limitations, ClariCT.AI was also sufficiently accurate for nodule volumetry in low-dose and ultra-low-dose settings, and research including images obtained by scanners from other vendors is needed.
There were several limitations to our study. First, we did not evaluate 3-mm GGNs because we did not have any available to us. However, according to lung nodule management guidelines from the Fleischner Society, no routine follow-up is generally recommended for pure GGNs measuring 6 mm or less in diameter. Furthermore, we demonstrated results using 5-mm GGNs for volumetric accuracy. Second, the lack of real lung parenchyma in the chest phantom may have contributed to increased measurement reproducibility. Third, only 1.25-mm slice thicknesses and "Lung" kernel for reconstruction were used in this study; therefore, the generalizability of our results to other slice thickness or reconstruction kernels could not be studied. We used 1.25-mm slice thicknesses and the "Lung" kernel for reconstruction for all CT image datasets because these conditions are recommended by Korean Lung Cancer Screening Project (K-LUCAS) [28]. As the purpose of the present study was to compare the accuracy of two deep learning-based algorithms with IR for volume measurement of SNs and GGNs of various sizes at low-dose and ultra-low-dose settings, the CT protocol (1.25-mm slice thickness and "Lung" kernel for reconstruction) remained unchanged. However, further research on this topic is warranted.

Conclusion
In conclusion, TFI and ClariCT.AI were more accurate than ASiR-V for obtaining volume measurements of both SNs and GGNs using low-dose and ultra-low-dose settings. The volumetric measurement errors associated with TFI and ClariCT.AI across all nodules at all dose settings were less than 11%. Ultra-low-dose CT scans reconstructed with TFI or ClariCT.AI could be used for follow-up of SNs measuring at least 5 mm. For follow-up of GGNs measuring at least 5 mm and −800 HU, ultra-low-dose CT performed at less than 1 mSv and reconstructed with TFI could be useful, and low-dose CT performed at dose settings of least 120 kVp/90mA (CTDI vol , 1.39 mGy) and reconstructed with ClariCT.AI might be acceptable. TFI was also superior to ASiR-V with regard to image noise when using ultra-low-dose CT.